Optimized Multiprocessor Communication and Synchronization Using a Programmable Protocol Engine

نویسنده

  • John Heinlein
چکیده

In recent years, multiprocessor designs have converged towards a unified hardware architecture despite supporting different communication abstractions. The implementation of these communication abstractions and the associated protocols in hardware is complex, inflexible, and error prone. For these reasons, some recent designs have employed a programmable controller to manage system communication. One particular focus of these designs is implementing cache coherence protocols in software. This dissertation argues that a programmable communication controller that provides cache coherence can also effectively support block transfer and synchronization protocols. This research is part of the FLASH project, a major focus of which is exploring the integration of multiple communication protocols in a single multiprocessor architecture. In our analysis, we examine the needs of protocols other than cache coherence to identify the requirements they share. The interface between the processor and controller is one critical issue in these protocols, so we propose techniques to export such protocols reliably, at low overhead, and without system calls. Unlike most prior studies, our approach supports a modern operating system with features like multiprogramming, protection, and virtual memory. Our study focuses in detail on two classes of communication that are important for large scale multiprocessors: block transfer and synchronization using locks and barriers. In particular, we attempt to improve the performance of these classes of communication as compared to implementations using only software on top of shared memory. For each protocol we identify the critical metrics of performance, explore the limitations of existing techniques, then present our implementation, which is tailored to leverage the programmable communication controller. We evaluate each protocol in isolation, in the context of microbenchmarks, and within a variety of applications. We find that embedding advanced communication and synchronization features in a programmable controller has a number of advantages. For example, the block transfer protocol improves transfer performance in some cases, enables the processor to perform other work in parallel, and

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Flash Multiprocessor: Designing a Flexible and Scalable System

The choice of a communication paradigm, or protocol, is central to the design of a largescale multiprocessor system. Unlike traditional multiprocessors, the FLASH machine uses a programmable node controller, called MAGIC, to implement all protocol processing. The architecture of the MAGIC chip allows FLASH to support multiple communication paradigms — in particular, cache-coherent shared memory...

متن کامل

Computer Network Time Synchronization using a Low Cost GPS Engine

Accurate and reliable time is necessary for financial and legal transactions, transportation, distribution systems, and many other applications. Time synchronization protocols such as NTP (the Network Time Protocol) have kept clocks of such applications synchronized to each other for many years. Nowadays there are many commercial GPS based NTP time server products at the market but they almost ...

متن کامل

Distributed Simulation and Profiling of Multiprocessor Systems on a Chip

Embedded systems – multiprocessor systems on a chip with application specific instruction-set processors (ASIPs) – become indivisible part of our everyday lives. They are everywhere. Therefore, powerful and flexible way of design and simulation of these systems is needed. The simulators of ASIPs are created using an architecture description language called ISAC. In this paper, the basic concept...

متن کامل

Evaluating the Potential of Programmable Multiprocessor Cache Controllers

The next generation of scalable parallel systems e g machines by KSR Convex and others will have shared memory supported in hardware unlike most current generation machines e g o erings by Intel nCube and Thinking Machines However current shared memory architectures are constrained by the fact that their cache controllers are hardwired and in exible which limits the range of programs that can a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1998